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Abstract 

We discuss a methodology to dynamically generate links among digital objects by means 
of an unsupervised learning mechanism which analyzes user link traversal patterns. We per- 
formed an experiment with a test bed of 150 complex data objects, refeiTcd to as buckets. 
Each bucket manages its own content, provides methods to interact with users and individu- 
ally maintains a set of links to other buckets. We demonstrate that buckets were capable of 
dynamically adjusting their links to other buckets according to user link selections, thereby 
generating a meaningful network of bucket relations. Our results indicate such adaptive net- 
works of Unked buckets approximate the collective link preferences of a community of users. 

1 Introduction 

Current research in the area of recommender systems has focused on analyzing static repre- 
sentations of user preferences, e.g. list of purchased items, to generate personalized recom- 
mendations [8]. However, user preferences are not static and shift as users assume different 
roles and interests. Furthermore, digital library applications often do not concern purchasable 
items but complex information objects, and user interests must be inferred from less explicit 
statements of interest. One such mechanism of inferring user interests is to analyze the links 
previously traversed by the user. We use buckets, smart digital objects, which individually 
manage a dynamic list of links to other buckets, to generate run-time, adaptive recommenda- 
tions. 



1.1 Smart Objects: Buckets 

Buckets are smart objects for the aggregation of data [7]. Buckets contain mechanisms to 
aggregate, manage, protect and preserve the data they contain. A bucket could be thought 
of as an intelligent, active folder which, among other functionalities, also contains interface 
methods to display its contents. 
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Buckets are not simply passive, folder-like repositories: they have an internal structure. 
A bucket may contain or more elements, each of which can contain elements in their own 
rights. An element may be a resource such as a PDF file, a data set or simply a set of other el- 
ements. An element may be a "pointer" to any arbitrary network object, e.g. another bucket, 
in the form of a URL. By having an element "point" to other buckets, buckets can logically 
contain other buckets. 

Buckets have no predefined size limitations, either in terms of storage capacity, or in terms 
of number of elements. Authors can model whatever application domain they desire using 
the basic structure of elements. Bucket methods can be activated by user HTTP requests. 

As an example of how methods in an bucket are invoked, consider the bucket identified 
by the URL: 

http : / /www . cs . odu . edu/~mln/naca-tn-2 50 9/ 

When no bucket method is specified, the "display" method is assumed. Therefore the 
mentioned URL is equivalent to: 

http : / / www . cs . odu . edu/~mln/naca-tn-2 50 9/ ?method=di splay 

The above mentioned URLs will induce the bucket to return an overview of the elements 
it contains. These elements themselves can again be URLs containing requests for bucket 
methods. A specific bucket method allows a bucket to redirect a request for its content to 
another object, which could be another bucket. For example, the request: 

http : //www . cs . odu . edu/~mln/naca-tn-2 50 9/ ?method=display&re\ 
direct=http ://naca.larc.nasa. gov/reports/1951/naca-tn-2 50 9/ 

would request the odu.edu bucket to redirect to the nasa.gov bucket. All requests for ex- 
ternal resources are first routed through the bucket that contains the link. The full bucket API 
is discussed in [5]. 

The motivation for buckets came from previous experience in the design, implementa- 
tion and maintenance of NASA scientific and technical information Digital Libraries (DLs), 
including the NASA Technical Report Server (NTRS) [6]. Buckets are well suited for dis- 
tributed applications because they can aggregate heterogeneous content and remain functional 
in low-fidelity environments. Since they are self-contained, independent and mobile, they 
should be resilient to changing server environments. In addition, buckets can be adapted to a 
variety of data types and data formats. 

1.2 Adaptive user interfaces 

Hebb's law of learning [4], an essential component of many unsupervised methods in machine 
learning, is the basis of our efforts to generate meaningful and dynamic sets of inter-bucket 
links. We have used a descriptive methodology [3] to bucket linking, where the user interface 
changes based on the past actions of users and not on predictions of users' future actions. An 
advantage of such adaptive user interfaces is that they can dynamically take into account user 
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information needs by continuously updating their structure and presentation. 

1.3 Hebb's Law of Learning 

Hebb's law postulates that the connection between two neurons in the human brain becomes 
stronger when the neurons are persistently activated in quick succession to one another. As 
such the brain continuously adapts the connections between neurons based on previous expe- 
riences. Although Hebb's law represents a coarse and incomplete picture of neural plasticity, 
it has found countless applications in machine learning. Hebb's law is specifically applicable 
to situations in which no set of coiTect or erroneous responses can be defined in advance, and 
the system needs to gradually acquire information which is only implicitly present in a given 
data set. 

For this reason, Hebbian learning has been successfully used in adaptive hypertext net- 
works [1] which learn to reroute hyperlinks according to usage patterns. In analogy to such 
systems, we use a variation of Hebbian learning for dynamic inter-bucket linking. 

2 Implementing Hebb's laws in buckets 

Fig. 1 gives an overview of how Hebbian learning can be interpreted for inter- bucket Unking. 
Let us imagine the user traversing 3 buckets namely bl, b2 and b3. bl is linked to hi and 
hi is linked to b3. It is assumed that there are no other links among these 3 buckets initially. 
When a user traverses from bl to hi, the link (61 —> 62) is strengthened by a frequency rein- 
forcement. When b2 is linked to from bl, we conjure that b2 is related to bl and strengthen 
the link (62 — > 61). If the link (62 61) is absent, it is created. When the user traverses from 
b2 to b3, the weight of the link (62 63) is increased by a frequency reinforcement and the 
weight of the link (63 62) is incremented by a symmetry reinforcement. Since the user 
finally reached b3 from bl with b2 as an intermediary, we assume that bl has some degree 
of relation to b3 and hence we increase the strength of the link (61 63) by a transitivity 
reinforcement. If the link (61 — > 63) is absent, it is created. 

The approach taken to implement the above procedures was suggested in [2]. When a 
bucket bl is expected to link to bucket b2, bl is called with a redirect argument and this 
argument gives the URL of the bucket it is linking to. We also pass a referer argument which 
essentially overrides the HTTP referer argument. The values passed by the referer argument 
would be instrumental in implementing Hebb's laws as explained below. 

When the link from bl to b2 is traversed, bl is called with the URL: 

http : / /bl ?method=di splay &referer=bl& redirect =http : / /b2 ?method\ 
=di splay %2 6referer=http : / /bl 

bl knows itself as a referer by seeing the referer argument and also concludes by seeing 
the redirect argument that it is redirecting to b2. Thus the hnk to b2 in bl is incremented by 
a given frequency reinforcement. b2 sees that the referer is bl and increments the weight of 
its link (62 — > 61) by a given symmetry weight. When the user next traverses to b3 from b2, 
the following link is dynamically generated: 
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Figure 1 : Implementing Hebbian learning in buckets 

http : //b2 ?method=di splay &referer=b2&redirect=http : //bl ?method\ 
=display%2 6redirect=http : //b3 ?method=display%2 6ref erer=http : // 
b2 . 

b2 sees itself as the referer and finds that b3 is the final destination based on the last redi- 
rect argument. b2 increases the weight of its link to b3 by a given frequency reinforcement. 
After incrementing the link weight, hi redirects to bl. bl sees that there is no referer argu- 
ment and so, increases the link weight of (61 — > 63) by the transitivity reinforcement. Finally 
when b3 is called, it finds the referer argument to be b2 and increments the weight of the link 
(63 —^ 62) by the symmetry reinforcement. 

Reinforcement values are based on our experiences with previous systems: the frequency, 
symmetry and transitivity reinforcements are respectively defined as 1.0, 0.5 and 0.3. The 
frequency weight is the highest since the user directly traverses this link and we have positive 
confirmation that this link was deemed relevant. 



3 Experimental Test Bed 



One hundred and fifty buckets were used for this experiment. Each bucket represented a pop- 
ular music artist, containing a short biography of the band and a dynamic list of related links 
to other buckets in the network. The list of 150 music bands was composed from the top 50 
bands of all times as chosen by experts from Spin Magazine [9] and two each of their similar 
bands as suggested in www.allmusic.com. Each bucket was initially randomly Unked to 3 
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- <element wt="3.5" id=V^http://12S.82.7.113/ruarauind/test2/bl3/ "> 

- <metaddtd> 
- <descnptive> 

<title>Gream</title> 
</descnptivE> 
<administrative /> 
</metadata> 
</element> 



Figure 2: XML representation of an element and the associated weight 
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Figure 3: An example of bucket display 



other buckets with a weight of 0.5 to provide an initial unbiased navigation structure. Fig. 3 
shows the display in one such bucket. The bucket displays metadata related to the band and a 
set of links to other artists/bands. As users traverse the system, new links are created and the 
weights of pre-existing Unks are increased based on the users Unk selection. The set of links 
are sorted based on their weight so that a heavily weighted link is shown higher up in the list 
of links than a less weighted link. 

Every bucket has an XML file, which contains all information about the elements in the 
bucket and their metadata. The weight of each individual link is stored as an attribute of the 
element (URL it is pointing to) in the XML file, as shown in Fig. 2. New elements can be 
added to the XML file using the addElement method. 

An invitation to traverse the network was sent to 15 people in June 2003. The total weight 
associated with all the links excluding the initial random links was 1719 units at the end of 
the experiment. Taking into account the reinforcements assigned for frequency, symmetry 
and transitivity we estimate the system to have had approximately 1041 direct traversals. 
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Example Bucket: 'The Clash ' 



Links before traversal 


Links after traversal 


The Beatles 


Smashing Pumpkins 


Glyn Jones 


Beck 


Beck 


Fishbone 




Nick Lowe 




The Beatles 




The Smiths 




Replacements 




Glyn Jones 




N.W.A 




Squeeze 



Table 1: An example of the dynamic links generated for 'The Clash'. 



Example Bucket: 'The Smiths' 



Links before traversal 


Links after traversal 


Elvis Costello 


Replacements 


Tool 


Elvis Costello 


Replacements 


Pretty Things 




Fishbone 




Nick Lowe 




The Beatles 




Fishbone 




The Clash 




Johnny Thunders 




Kiss 




Tool 



Table 2: An example of the dynamic links generated for 'The Smiths'. 
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4 Results 



Our aim is to prove that when users start surfing the collection from a portal node or bucket, 
a meaningful network develops in which the content of highly centric nodes is similar or 
related to the content of the portal bucket. 

4.1 Link Structure 

The bucket representing 'Public Enemy' was the entry point to the network. This setup is 
similar to a portal thi^ough which users access web services, (e.g. www.yahoo.com). The 
portal bucket and every heavily traversed bucket starts reflecting the users preference on what 
other buckets should be linked to the cun^ent bucket. Table 1 shows the links associated with 
'The Clash' bucket before and after traversal. As users navigate the bucket, new links are dy- 
namically created and the bands which users presume are more related to 'The Clash' bubble 
up to the top. 

Another similar- example is shown in Table 2. It is evident that users do not associate 
'Tool' ( an initial random addition) with 'The Smiths' and hence it has dropped down in the 
list of links as compared to 'Replacements' and 'Elvis Costello' which are also random initial 
links but have maintained their positions at the top of the list, and do match the nature of The 
Smiths as a music band. 

4.2 Bucket Authority 

A highly influential node within a network can be expected to have a relatively high number 
of outgoing and incoming connections to and from other nodes in the network, a character- 
istic refered to as degree centrality. An investigation into the degree centrality of nodes in a 
network will reveal the network's most important nodes. Applied to the generated network of 
buckets, degree centrality can therefore be used to partially validate network structure. Since 
our buckets concern music bands, degree centrality may relate to the relative importance or 
influence of music bands according to the community of users that generated the network. 

We define the degree centrality of a node as the number of links that originate from or 
terminate in that particular node. The weighted degree centrality of a node is computed as 
the sum of all the weights of the links that originate from or terminate in that particular node. 

Degree centrality dc-i is defined as Eq. 1 where lij is 1 if there exists a link from bucket 
i to j, zero otherwise. Weighted degree centrality wci is defined as Eq. 2 where w-ij is the 
weight of the Unk linking bucket i to bucket j. Wij is if bucket k is not linked to bucket j. 



n 



n 




(1) 



n 



n 
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Degree Centrality 




Patti Smith Group 



Tool Parliament Funkadelic 



Rank 



The Velvet un- 



derground WSt Sonic Youth 
Craig Armstrong 



Minutemen Love and Rockets 



Figure 4: Top eight degree centrality rankings based on initial random linking. 
Degree Centrality 



60-1 42 
50- / 



30 30 30 29 29 2 8 27 




Rank 



Public Enemy 



The Velvet underground 1=1 Fishbone 



Sonic Youth The Stooges Parliament Funkadelic 

Figure 5: Top eight degree centrality rankings after users surfed the system. 



The Pixies 
Tool 



Fig. 4 shows the ranking of the top 8 buckets based on degree centrality rankings when the 
network was intially setup. The rankings in this case are purely random and no user traversals 
had taken place. In this case, the degree and weighted degree centrality rankings are the same. 
Fig. 5 shows the top 8 rankings based on degree centrality after approximately 1041 direct 
traversals by 15 users. 'Public Enemy' is seen to be the most popular band according to 
degree and weighted degree centrality measures. This was expected since 'Public Enemy' 
was the access point to the network for all users. We also find influential bands such as the 
"The Velvet Underground", "The Stooges" and "L.L. Cool J." 

4.3 Hierarchical Ranking 

Since all users entered the network stalling at the "Public Enemy" buckets, it makes sense to 
investigate this buckets connections to other buckets as a means to validate network structure. 
Fig. 7 gives the hierarchy of the most popular bands starting from "Public Enemy" including 
secondary, tertiary and reinforced but initially random links. The weight of the links connect- 
ing every two bands is noted within parentheses next to the band lower in the hierarchy. Each 
band marked with * indicates that it is an initial random link which has been reinforced. 
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Weighted Degree Centrality 




Rank 

I I Public Enemy II 1 1 1 1 1 II Sonic Youth 1=1 The Pixies BttttI The Stooges 
Smashing Pumpkins 1^^ The Velvet Underground 1^^ Fishbone LL Cool J 

Figure 6: Top eight weighted degree centrality rankings after users surfed the system. 

The 150 bands were graded by two music experts on a scale from 10 to 0, with 10 signi- 
fying close relation between the band and "Public Enemy" and signifying no relationship 
between the bands. The rankings were later normalized to a scale of 1. We compute the 
relationship weights between every band in Fig. 7 and 'Public Enemy' as the sum of the 
product of all normalized intermediary link weights in order to compare network weights to 
the expert opinion. 

We can formalize this procedure as follows. Assume two buckets bi and bj are con- 
nected in the network shown in Fig. 7 via a path p of length n, so that the ordered set 
p = {bi,b2, ■ ■ ■ ,bk) represents the buckets on the path that connects bi and bj. Multi- 
ple paths can be identified between any two buckets, therefore we have a set of k paths 
P = {Pi,P2, ■ • • ,Pk}- 

Eq. 3 is used to compute the weight of relationship of any bucket bi and the bucket bj 
in the generated hierarchical tree, given that W {bh G Pg, bh_i € pg) represents the weight 
between the bucket bh and its predecessor b^^i in path pg. 

k n 

W{bi, b,) = 5^ n ^(^'^ ^ Pa^ ^h-^ ^ Pa) (3) 

g=l h=2 

We examined the indirect link weights of any buckets and the "Public Enemy" bucket, so 
bi is assumed to be the "Public Enemy" bucket in all cases. 

Fig. 8 shows the scatterplot of the expert and network relationship values to 'Public En- 
emy'. The correlation coefficient between network and expert evaluations of bucket relations 
to "Public Enemy" was found to be 0.48 indicating that relationships in the graph coiTcspond 
to at least two expert judgments. The expert opinions need further validation and with more 
usage the network could be expected to better reflect user tastes. 
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Ice-T(6.3) 




Public Enemy 




MC5'-(4.D) 






Tool(1.8) 



The Pixies*(3.3) 




Gwar(4.S) 



The Pises(2.D) 



Black Sabbath(2,0) 



LLCoolJ*(ll.?) 








Velvet Underground (5. S) 
My Bloody Vsiientine (5,5) 




Sonic Youth (2.8) 






SonicYouth(5.3 ) 






The Pisies(6,l) 






Kiss'-(3.5) 
Velvet Underground (3.8) 




The Stooges(3,3) 









Velvet Underground *( 1.5) 
Black Sabbath (1.5) 



ParUment Funkadelic (3.8) 
PaW Smith Group (3,0) 



lce-T(l,l) 








Slayer*(l,5) 
Lynyrd Skynrd*' (1,5) 


MudHoney*(3,0) 







Fugazi''(2,5) 



LLCoolJ(l,l) 



The3awDoctors*(l,5) 

Jon Spencer Blues Explosion* (1,5) 



Figure 7: Hierarchical ranking of bands related to 'Public Enemy'. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



expert 

Figure 8: Comparison of expert and network ranking of band relationships to 'Public Enemy'. 



10 



5 Future Research 



While the importance of the nodes has been gauged based on degree and weighted degree 
centrality measures, it would be interesting to perform an analysis based on principal compo- 
nents and other clustering techniques. 

Another system feature could be decrementing the weight of rarely used links. This would 
help filter out spurious links created by the initial random linking. The basis for decrementing 
the link weight needs further study. Options include the time for which a link has not been 
accessed and the frequency of access of other links. 

Finally, Hebbian learning could be implemented on a portion of a bucket instead of the 
entire bucket. This would allow a bucket to have section(s) of content that are fixed and 
section(s) of content that can adapt by Hebbian learning. Imagine bucket 1 containing 2 high 
level elements/sections (HLEl and HLE2) each with a number of leaf elements. When the 
user links from bucket 1 to bucket2 via a link provided in HLEl, bucket2 would be aware 
of not only the bucket it was linked from (bucket 1) but also the section (HLEl) within that 
bucket. 

6 Conclusion 

We have implemented a system for the automated linking of information using a collection of 
smart objects, labeled buckets, using a set of simple learning rules which change link weights 
based on user retrieval patterns. The bucket networks gradually change structure as users 
retrieve one bucket after another via a list of recommended buckets. 

It is evident from the results that although a collection of buckets are initially randomly 
linked, with adequate user traversal they form a meaningful linkage with resembles the users 
idea of which buckets should be related to which other buckets. The most centric nodes in 
the network happen to be either influential or very popular music bands related to "Public 
Enemy". These bands have high degree and weighted degree centralities. 

It was found in the course of analysis, that the rankings based degree centrality was more 
susceptible to change due to drastic use even by a single user. However, weighted degree 
centrality offers a more graded and stable approach. 

The random collection initially presented to the users would not return the ideal results 
needed to satisfy the users information need. The possibility of the system returning the 
ideal answer set for a user's information need increases with usage of the system. The usage 
needed to create a well suited network depends on the number of buckets in the network and 
also on the diversity of the users' information need. When the users information need is of 
limited scope (e.g. all users interested in rock music) a meaningful network can be expected 
to form fairly quickly. 
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